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Abstract — In this paper, we firstly modify a parameter in 
affinity propagation (AP) to improve its convergence ability, and 
then, we apply it to vector quantization (VQ) codebook design 
problem. In order to improve the quality of the resulted code- 
book, we combine the improved AP (lAP) with the conventional 
LBG algorithm to generate an effective algorithm call lAP-LBG. 
According to the experimental results, the proposed method 
not only improves its convergence abilities but also is capable 
of providing higher-quality codebooks than conventional LBG 
method. 



I. Introduction 

Vector quantization (VQ) is an effective method in lossy 
data compression, and is widely used in the field of speech 
■ coding, image coding, video compression, etc. [1] [2] [3]. 
According to the VQ processes, we know that, for image 
compression, the quality of the reconstructed image highly 
depends on the quality of the codebook. Hence the codebook 
design is a very important task for VQ. Essentially, the 
codebook design is a clustering problem. Given a set of 
training vectors V — {Vn : n = 1,2, ... , N^, the goal is 
"to search for a map : V : {Vn : n = 1, 2, ... , C : 

, {Cm '■ m — 1,2, ... , M} ,N:^M which maps each training 
'vector (or training point) Vn to its cluster centroid C„,. This 
map shall minimize the cost function. Here, a simple squared 
Euclidean distance measure E = I]||y„ — C„i|p is used as 
, the cost function. 

A generalized algorithm was proposed by Linde, Buzo, 
and Gray (LBG) [4]. It is the most popular codebook 
design method. LBG iteratively applies two optimality 
'conditions (nearest neighbor condition and centroid 
condition) to generate a codebook. However, it suffers 
from local optimality and is sensitive to the initial solution. 
If the initial solution is poor, the resulted codebook's quality 
will probably be poor, and as a result it will be difficult to 
produce a high-quality image. 

Recently, a powerful algorithm called Affinity Propagation 
(AP) for unsupervised clustering was proposed by Frey and 
Dueck [5] . In AP algorithm, each point in a set is viewd as 
a node in a network. AP is based on message passing along 
edges of the network, following the idea of belief propagation 
[6] [7]. AP takes input real-value similarities s{n,m) which 
indicate how well the data point m is suited to be the 



cluster centroid to data point n, and then, two kinds of real- 
value messages "responsibility" r{n,m) and "availability" 
a{n, m) are exchanged among data points until a high-qulity 
set of cluster centroids and corresponding clusters gradually 
emerges [5]. Breifly, there are two significant advantages of 
AP: one is its high-quality clustering capabilty; the other is 
its computational efficiency, especially for large data sets [8]. 
However, in AP, for self-similarity is the same for each point, 
all data points are simultaneously considered as potential 
clustering centroids. Actually, this feature brings a drawback 
for AP, since it will be more difficult to converge. 

In this paper, we propose an improved AP (lAP) 
algorithm by modifying a parameter called network-support 
similarity ns{m,m) for each point which improves the 
algorithm's convergence abilities. In the original AP, all 
the self-similarities [5] are the same for all points,however, 
network-support similarity ns{m,m) for each point changes 
according to the network supports. As to a point in the 
network, network-support similarity equals to the average 
squared Euclidean distance from this point to the other 
points in the whole network. We consider that data points 
with larger values of network-support similarities are more 
probably to be chosen as clustering centroids. Also we offer a 
parameter called ratio of similarity rs to control the number 
of codewords needed. In addition, based on lAP algorithm, 
we propose an algorithm called lAP-LBG to effectively 
design the VQ codebook. Due to the strong clustering ability 
of lAP, the codebook's quality is further improved. 



II. LBG Algorithm 

Suppose that we want to design a codebook with M 
codewords from a training set V = {Vn : n ~ 1, 2, . . . , iV}, 
where N ^ M. The processes of the LBG algorithm are 
described as follows. 

Step 1 Randomly select M vectors from the training set 
V as the initial codebook C = {Cm ■ m = 
1,2,..., M}; 

Step 2 For Wn G V, find the closest codeword in the cur- 
rent codebook C according to the squared Euclidean 
distance E — \\Vn — Cm \ \ and then add the training 
vectors into the corresponding cluster of the closest 
codeword found. 



Step 3 For VCm G C, calculate the overall average dis- 
tortion D = Em=i E/=i £^ between the 
codeword and each training vector of the associated 
cluster. J is the total number of training vectors in 
the associated cluster. 

Step 4 If the difference in overall average distortion between 
the last two successive iterations is smaller than some 
threshold or a certain number of iterations is reached, 
then terminate the iteration. 

Step 5 For each codeword in the current codebook C, 
evaluate the centroid of its associated cluster and take 
the centroid as a new codeword for the next iteration. 
Go back to Step 2. 

The conventional LBG algorithm suffers from local opti- 
maUty and is sensitive to the initial solution. So if the initial 
solution is poor, the generated codebooks quality will probably 
be poor, and as a result it will be difficult to produce a high- 
quality image when decodeing. 



III. Affinity Propagation Algorithm 



AP takes as input similarities s{n,m) which indicate how 
well the data point m is suited to be the centroid for data point 
n. Here the similarity is set to be a negative squared Euchdean 
distance: 

s{n,m) = -\\Vn-Cmf. (1) 



s{m,m) indicates that data points with larger values are 
more likely to be chosen as clustering centroids. The number 
of the final examplars is influenced by the value of s{m,,m). 
In the conventional AP, all data points are simultaneously 
considered as potential examplars so the authors set all 
s{m,m) to be the same value [5]. 

In the processing, two kinds of message are exchanged 
among data points, and each takes into account a different 
kind of competition. The "responsibility" r{n,m), sent from 
data point n to candidate exemplar point to, reflects the 
accumulated evidence for how well-suited point m is to 
serve as the exemplar for point n, taking into account other 
potential exemplars for point n. The "availability" a{n, to) 
sent from candidate exemplar point to to point n, reflects the 
accumulated evidence for how fitting it would be for point 
n to choose point to as its exemplar, taking into account the 
support from other points that point to should be an exemplar. 
To begin with, the availabilities are initialized to zero, and in 
the whole process, they foUowes the updating rule below. 



r(n, to) = s(n, to) — max {a{n, to ) + s(n, to )} 

m s.t. m T^m 

(2a) 

r(m, m) = s(m, m) — max {a(m, m ) H- s(m, m )} 

m s.t. m T^m 

(2b) 

a(n, m) = min {O, r(m, m) + max {O, r(n , m)} 

n s.tn' ^n,m 

(2c) 

a{m,m) = ^ max {O, r(n , to)} (2d) 

n s.t. n 

Messages are updated on the basis of simple formula that 
search for minima of an appropriately chosen energy function. 
At any point in time, the magnitude of each message reflects 
the current affinity that that one data point has for choosing 
another data point as its exemplar. 

For point n, the value of to that maximizes 
a(n, to) -|- r(n, to) either identifies point n as an exemplar 
if TO = n, or identifies the data point that is the exemplar 
for point n [5]. The message-passing procedure may be 
terminated after a fixed number of iterations, after changes 
in the messages faU below a thereshold, or after the local 
decisions stay constant for some number of iterations. 

IV. Proposed Algorithm 

Since in the conventional AP, the authors consider that all 

data points can be equally suitable as exemplars, they set self- 
similarities of each point to be the same. However, we propose 
a different view of s{m,m). We argue that the self-similarity 
of each point should vary according to the similarities between 
this point and the others. A point may "love" to take itself 
as a exemplar more if it "knows" there are more other points 
choosing it to be a exemplar. We call this rule network-support 
similarities which, in this paper, is denoted as ns{m,m): 

Yl {s{m ,m)] 

ns{m,m) = J^'ZTi 

We consider that the point whose ns{m, to) is larger would 
be more appropriate to be an examplar. Because the cluster 
shape is regular in VQ codebook design, there is only one 
centroid for each cluster. As to a point, when more points 
support it to be a centroid, it should prefer to choosing itself as 
a centroid than other points. In order to get the very number of 
codewords, we set a tuning parameter called ratio of network- 
support similarities rs. And we find that the codeword number 
decreases monotonously with rs. 

s{m, m) = rs X ns{m, to) (4) 

To fine-tune the final solution, we use LBG algorithm 
after the process of the lAP. Since some codewords may be 
replaced, we must update the associated clusters to reduce 



distortion errors. Because lAP could generate good codebooks 
in the first step, the following LBG process would generate 
effective codebooks finally. 



can clearly see that it only takes lAP 71 iterations to 
converge, however, it takes AP more than 100 iterations. 
Moreover, the value of the energy function is almost the same. 



The process of the proposed algorithm is as follows. 

Step 1 Caculate similarities s{n,m) for all data points ac- 
cording to Eq.([T]i. 

Step 2 Caculate network-support similarities ns{m^m) for 
all data points according to Eq.(l3]i, and initialize 
rs = 0.1. 

Step 3 Initialize all a{n, ra) — r{n, m) — 0. 

Step 4 Vri e {l : N^, update the N responsibilities r{n, m) 
and then the N availabilities aim, n) parallelly ac- 
cording to Eqs.(|2]i. 

Step 5 Identify the exemplars Cm by looking at the maxi- 
mum value of r{n, m) + a{n, m) for given n. 

Step 6 Repeat Steps 4-5 until there is no change in exem- 
plars for a large number of iterations. 

Step 7 Modify rs and repeat Step 3-6 to get the codebook 
of the right size. 

Step 8 Use the codebook generated in Step 6 as an initial 
codebook then continue to use LBG method to 
generate the finial codebook. 

Compared with the original AP, firstly, our improved 
AP (lAP) could converge much faster (see Figure 1). Sec- 
ondly, our lAP-LBG algorithm could performance better on 
generating codebooks for VQ (see Table 1 and Table 2). 

V. Experimental Results 

Five 256 x 256 pixels monochrome images ("peppers", 
"lena", "bridge", "camera", and "baboon") with 256 gray 
levels are used to evaluate the effectiveness of the proposed 
algorithm. Firstly, we train respective codebooks of size 256 
with 16 dimensions using corresponding images, then we take 
the codebook of image "peppers" as a universal codebook and 
use all images to test its quality. In addition, in the process 
of generating the codebook of "peppers", we also compare 
the convergence abilities of lAP with the conventional AP 
algorithm. 

Performance comparisons are made among the conventional 
LBG algorithm [4], the conventional AP algorithm [5], the 
lAP algorithm and the lAP-LBG algorithm. For LBG, 
the maximum number of iterations is set to be 50, and 
threshold is 10^^ and the final results are obtained after 
20 runs. For AP, lAP and lAP-LBG, when updating the 
message, we also use a parameter called damping factor 
[5] to avoid numerical oscillations that arise in some 
circumstances. The value of damping factor is set to be 0.5 
in all of our experiments. As to AP, lAP and lAP-LBG, 
since rs is modified by the characters of the training sets, 
rs varies as image changes. We tune the parameter rs 
to obtain 256-size codebooks. They are given as follows, 
rs{peppers) = 0.086, rs (lena) — 0.14, rs{camera) — 
O.U9,rsibaboon) = 0.280, rs{bridge) = 0.203. 

Firstly, We compare the convergence abilities of lAP 
with the conventional AP algorithm. From Figure 1, we 
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Fig. L Comparisons on convergence abilities of AP and lAP 

Comparisons measured by PSNR (dB) on genarating code- 
books for the five different images are compared among the 
four methods. Results are shown in Table 1 and Table 2. The 
codebooks used in Table 1 are generated from the training sets 
accordingly, and the codebook used in Table 2 is generated 
from the training set of the "peppers". From Table 1, we 
can see that lAP-LBG method can improve the PSNR of the 
generated codebook by 0.62 dB compared with conventional 
AP, and 0.95 dB compared with conventional LBG averagely. 
From Table 2 we can see that lAP-LBG algorithm can improve 
the PSNR by 0.18 compared with conventional AP, and 0.28 
compared with conventional LBG averagely. In a word, the 
proposed algorithm in this paper is really effective. 



Algorithms 


LBG 


AP 


lAP 


lAP-LBG 


peppers 


3L18 


3L46 


31.38 


32.04 


lena 


29.67 


30.02 


29.94 


30.64 


camera 


25.95 


27.68 


27.69 


28.49 


baboon 


26.52 


26.11 


26.03 


26.71 


bridge 


25.48 


25.18 


25.07 


25.67 


average 


27.76 


28.09 


28.02 


28.71 



TABLE I 
PSNR (dB) within training set 



VI. Conclusions 

In this paper we propose a method called lAP-LBG 
which improves the quality of VQ codebook. Firstly we 
improve the convergence abilities of the conventional AP 
algorithm by modifying a parameter called network-support 
similarities, then take this codebook as initial codebook and 



Algorithms 


LBG 


AP 


lAP 


lAP-LBG 


peppers 


31.18 


31.46 


31.38 


32.04 


lena 


27.34 


27.40 


27.39 


27.54 


camera 


22.55 


22.79 


22.57 


22.75 


baboon 


24.97 


24.89 


25.01 


25.05 


bridge 


23.65 


23.68 


23.66 


23.74 


average 


25.94 


26.04 


26.00 


26.22 



TABLE II 
PSNR (dB) outside training set 



use the conventional LBG method to generate a high-quality 
codebook. In the experiment, performance comparisons made 
among five different images potently proved its effectiveness 
in reconstructed images quality. 
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